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Abstract. There exists a wide literature on parametrically or semi-parametrically modeling 
strongly dependent time series using a long-memory parameter d, including more recent work 
on wavelet estimation. As a generalization of these latter approaches, in this work we allow 
the long-memory parameter d to be varying over time. We adopt a semi-parametric approach 
in order to avoid fitting a time-varying parametric model, such as tvARFlMA, to the observed 
data. We study the asymptotic behavior of a local log-regression wavelet estimator of the time- 
dependent d. Both simulations and a real data example complete our work on providing a fairly 
general approach. 



1. Introduction 

There is a long tradition of modelling the phenomenon of long-range dependence in observed 
data that show a strong persistence of their correlations by long-memory processes. Such data 
can typically be found in the applied sciences such as hydrology, geophysics, climatology and 
telecommunication (e.g. teletrafhc data) but recently also in economics and in finance, e.g. for 
modelling (realized) volatility of exchange rate data or stocks. The literature on stationary 
long-memory processes is huge (see e.g. the references in the recent survey paper [l3]), and 
we concentrate here on the discussion of long-range dependence resulting from a singularity 
of the spectral density at zero frequency - corresponding to a slow, i.e. polynomial, decay of 
the autocorrelation of the data. Although a lot of (earlier) work started from a parametric 
approach, using e.g. the celebrated ARFIMA-like models, it occurs that since the seminal work 
by P. Robinson (see HUHf}), the semi-parametric approach is known to be more robust against 



model misspecification: instead of using a parametric filter describing both the singularity of 
the spectral density at zero frequency and the ARMA-dynamics of the short memory part, only 
the singular behavior of the spectrum at zero is modelled by the long-memory parameter, d say, 
whereas the short memory part remains completely non-parametric. 

Driven by the empirical observation that the correlation structure of observed (weakly or 
strongly dependent) data can change over time, there is a also a growing literature on modelling 
departures from covariance-stationarity, mainly restricted to the short-range dependent case. 
One prominent approach, that we adopt in this paper, too, is the model of local stationarity, 
introduced by a series of papers by R. Dahlhaus ([H, 0, 0]): in a non-parametric set-up, the 
spectral structure of the underlying stochastic process is allowed to be smoothly varying over 
time. Of course, time- varying linear processes (of ARMA type) arise as a subclass of these locally 
stationary processes. In order to come up with a rigorous asymptotic theory of consistency 
and inference, the time-dependence of the spectral density f(u,X) of such locally stationary 
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processes is modelled to be in rescaled time u E [0, 1], leading to a problem of non-parametric 
curve estimation: increasing the sample size T of the observed time series does no more mean to 
look into the future but to dispose of more and more observations to identify f(t/T,X) locally 
around the "reference" rescaled time point u ~ t/T. 

In the aforementioned spirit of semi-parametric modelling, and in contrast to the parametric 
approach of 0], one of the very few existing approaches on time- varying long-memory modelling, 
we consider in this paper a locally stationary long-range dependent process with a singularity 
in the spectral density at zero frequency which is parameterized by a time- varying long-memory 
parameter d = d(u), u € [0, 1], i.e. defined in rescaled time. Our approach is a true generalization 
of the stationary approach in that the latter corresponds to a time-constant d for our locally 
stationary model. As in the case of [23J], the long memory parameter is estimated by a log- 
regression of a series of wavelet scalograms (estimated wavelet variances per scale by summing 
the squared wavelet coefficients per scale over location) onto a range of scales (corresponding 
to the low frequency range of the spectrum) . Although wavelets do not improve the estimation 
of d in the standard stationary context —1/2 < d < 1/2, their use is of interest in various 
practical situations (presence of trends, under and over-differenced series leading to d > 1/2 and 
d < —1/2 respectively), see details in 13J]. However, in our work now the challenge is to localize 
the estimation of the no more constant parameter d. Wavelets are favorable in this situation 
since, in contrast to a Fourier analysis, they are well localized both over time and frequency, 
i.e. scale. The localization is achieved by smoothing over time the series of squared wavelet 
coefficients on each of the coarse scales which enter into the log-regression, giving raise to a local 
scalogram. We propose both a more traditional method based on two-sided kernels and also a 
recursive scheme of one-sided smoothing weights, adapted to the end point of the observation 
period. 

The model studied in this paper arises naturally in the now long history of time series mod- 
elling in presence of persistent memory. A survey on this subject is provided in [3(J. In Chapter 3 
of this reference, it is recalled how long memory and non stationarity have been used as concur- 
rent modelling approaches, in particular for financial data, see e.g. [19|. Long memory modeling 
for financial data goes back to [13] ■ Originally investigated on absolute powers of stock returns, 
long memory models are currently widely used for realized volatility data since they were proven 
efficient for forecasting purposes in this context in 0]. It is interesting to note that only 3 years 
after [12|], the need for time varying long memory parameters was pointed out in ll|, see Sec- 
tion 5 and in particular Figure 6 where an estimated d is plotted evolving between the values 
0.358 and 0.714 over a 60 years long period. In this reference two approaches are suggested for 
coping with a time varying d, namely, a stochastically evolving d or a regime switching between, 
say, two values of d. These two approaches have been developed, respectively, in [24| and [17] 
(see also [l(| where singularities in the spectral density at frequencies different from zero are 
considered in a piecewise stationary context). 

As outlined earlier the alternative approach developed by Dahlhaus for locally stationary 
(short memory) processes is quite appealing as it allows a meaningful asymptotic study of the 
estimators. It was recently used for volatility estimation using time varying (short memory) 



non-linear processes, see [3, [15(]. The first attempt to adapt Dahlhaus's approach in the pres- 



ence of long memory appears to be the unpublished preprint [la ]. The authors use the log 
linear relationship of the local variance of the maximum overlap discrete wavelet transform and 
their scaling parameter, plus a localization with a rectangular window in coefficient domain, 
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to estimate the time-varying long memory parameter. However, the asymptotic analysis of the 
proposed estimator is not provided. Although it is not essential in their analysis, the considered 
model is a tvARFIMA(g, d, r), see our examples below. An asymptotic analysis is provided in 
0] for a different estimator applied to the same model. Roughly speaking, the standard way to 
estimate a time varying parameter d(u) at rescaled time u = t/T £ (0, 1) of a locally stationary 
process is to use that, for a sample size T, Tb observations around time index t approximately 
behave as a stationary sample as the bandwidth parameter b — > 0. In 0, Theorems 1] the 
proposed estimator of the time varying long memory parameter d is claimed to satisfy a central 
limit theorem at rate (T6) -1 / 2 . On the other hand Theorem 2 in 0] says that the bias is of 
order b 2 if d is two times continuously differentiable. Such results are somewhat similar to that 
for estimating the time varying parameter of a locally stationary short memory process, see [20(1 , 
or, in a more general fashion, Example 3.6 in 0]. Hence the presence of long memory injj] 
seems not to affect the estimation rates. It can be explained by the parametric approach for the 
correlation structure of the observed locally stationary time series in that the filter in the linear 
(although infinite) locally autoregressive representation of the process is completely determined 
by a finite-dimensional parameter. It follows that only a finite number of local correlations are 
needed to determine the local parameter d. In other words the estimation of d(u) can be ob- 
tained from an analysis over a fixed set of frequencies taken away from zero. This would however 
induce a high sensitivity to model misspecification. In contrast, as usual in semi-parametric vs 
parametric estimation, we want our approach to be robust against model misspecification. To 
impose this robustness in the semi-parametric model, the memory parameter d is disconnected 
from the spectral content out of the origin so that short range correlations does not carry any 
information about this parameter. This fully justifies the semi-parametric context even though 
it is more involved as it necessitates a low frequency analysis (where the long memory behavior 
occurs) which, at first sight, seems contradictory to the local stationarity framework. In fact, 
this contradiction is inherent to any locally stationary model which rely on a compromise be- 
tween stationarity, which appears at small scales, and analyzing bandwidth, which requires a 
large scale to decrease the randomness in the data. As a consequence practical applications of 
these models require very long data sets. Our results will prove that this apparent contradiction 
can still be overcome for locally stationary long memory models, but with some price to pay on 
the rate of estimation (although we are unable to prove that this price is optimal). It is not as 
surprising as it may appear. To understand why, consider a piecewise stationary context where 
a finite number of regime switching times occur over the observation sample. One clearly sees 
that the long memory over each stationary segment can be estimated at the same rate as in 
the stationary context. As we will see, the picture is more complicated in a locally stationary 
context but it can still be handled. More precisely, we will show that looking at low frequencies 
is allowed in a locally stationary model but with an additional cost on the rate of convergence 
depending on how small the frequency used for the analysis is. We believe that such theoretical 
results are crucial for the practical estimation of the time varying long memory parameter as 
they demonstrate the viability of such an approach while indicating that it should be applied 
with care. 

Summarizing our results, the rest of the paper is organized as follows. In Section we 
give the technical details of our locally stationary long memory model of semi-parametric type 
and give a series of examples of processes falling into this model. In Section we define our 
estimators based on wavelet analysis for which we briefly recall the wavelet set-up. We define 
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the local scalogram which is at the heart of our wavelet based estimators. We also prepare our 
technique of stationary approximation by defining what we call the approximating stationary 
tangent process and its wavelet spectrum, the local wavelet spectrum, as well as the pseudo- 
estimator tangent scalogram. We finish this section by discussing a series of smoothing weights, 
one- and two-sided kernels, which fulfill our given assumptions. The asymptotic properties 
of our proposed estimators are stated in the following Section HJ We derive a mean-square 
approximation of the local scalogram through the tangent scalogram (Proposition [T|), followed 
by a control of the mean square error of the scalogram as an estimator of the local wavelet 
spectrum (Theorem [T]) and a CLT for the tangent scalogram (Theorem [SJ , which finally allows 
us to derive a CLT for the local scalogram ( Corollary [1]). The results on the asymptotic behavior 
of the estimator of d(u) are then obtained: Corollary [2] provides the rate of convergence and 
Theorem[3]the asymptotic normality. We pursue the paper by Section[5]on numerical examples, 
first simulating some ARFIMA process with a time-varying d and comparing the performance 
of the two-sided (rectangular) kernel with the recursive weight scheme. Second, we apply the 
kernel estimator to a series of realized log volatilities (see also namely of the exchange rate 
of the YEN versus USD, from June 1986 to September 2004. We conclude in Section [6] before 
an appendix section presents all technical details of our derivations including our proofs. 



2. Model set-up and examples 

Define the difference operator [AX]jt = A/% — X^-i and A p recursively. This will allow d(u) 
to take values up to p + 1/2 in the following model. 

We adapt the approach of 0| to the case where the spectral density is allowed to have a 
singularity at the zero frequency. Let us fix p = 0, 1, 2, . . . and A^ T (X) be an array of L 2 ([—tt, tt]) 
functions with real-valued Fourier coefficients. Let {X^t} be an array of real-valued random 
variables such that 

APX t , T = ^ 4t(A) e iA * dZ(X) , t = 1, . . . T, T > 1 , (1) 
where dZ(X) is the spectral representation of a centered weak white noise with unit variance, 

/•7T 

e t = e iXt dZ(X), t € Z , (2) 



hence Z(X) is a Hermitian complex valued process with weakly stationary orthogonal increments 
on [—it, it]. We further assume that there exist a function A(u, A) in L 2 ([0, 1] x [—it, tt]) and two 
constants c > and D < 1/2 such that 



\Al T (X) -A(t/T,X)\ < cT- 1 |A|~ D , 1 < t < T, -it < A < vr , (3) 

and 

\A(u;X) -A(v,X)\ < c\v-u\ |A|" D , < u,v < 1, -vr < A < tt . (4) 

These correspond to the definition of locally stationary processes introduced in 0] but with the 
term lAI"^ added in the upper bound to allow a singularity at the zero frequency. Relations (HJ, 
Q and give rise to the following time-varying generalized spectral density of {A^} 

/(u,A) = |l-e- iA r 2 ^ \A(u;X)\ 2 . (5) 
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The first multiplicative factor in the right-hand side of ([5|) corresponds to the operator A p in 
the left-hand side of ([1]). We now focus on time- varying generalized spectral densities exhibiting 
a memory parameter at zero frequency. 

Definition 1. We say that the process {X^t, t = 1, . . . , T, T > 1} has local memory parameter 
d{u) G (— oo,p + 1/2) at time u G [0, 1] if it satisfies (QP, |3J) and and ifcs generalized spectral 
density f(u, A) defined by (EJ) satisfies the following semi-parametric type condition: 

/(n,A) = |l-e- iA |- M (") /*(n,A) , (6) 

with f*(u, 0) > and 

|/*(«, A) -/*(«, 0)| <C/*M) lAf, A G [— 7r, 7r] , (7) 

w/iere C > and /3 G (0, 2] . 

The assumption on the model is summarized hereafter. 

Assumption 1. The array {Xt ?} of real- valued random variables has local memory parameter 
d(u) G (— oo,p + 1/2) at time u G [0, 1]. Moreover {et} in ([2]) is a weak white noise such that 
E[eo] = 0, Var(eo) = 1, Efef] is finite for all t G Z and the fourth-order cumulants of its spectral 
representation dZ(X) satisfy 

Cum (dZ(A fc ), 1 < fc < 4) = k A (X) d/i(A), A = (A fe )i< fe < 4 G [-vr, vr] 4 , (8) 

where k^(X) = K4(Ai, A2, A3) is a bounded function defined on [— 7r,7r] 3 , and fi is defined as the 
measure on [— 7r,7r] 4 such that, for any (27r)-periodic functions A k , 1 < k < 4, 



4 3 
/ f[A k (X k )dis(X)= [ A 4 (-X 1 -X 2 -X 3 )f[A k (X k )dX. 



(9) 



Assumption ([8]) is standard for linear representations of time series and was used by Dahlhaus 
(for cumulants of all orders) in the original definition of locally stationary processes in The 
measure \x defined by (|9|) is sometimes denoted as d/z(A) = n(Ai + • • • + A 4 )dAi . . . dA 4 , where n is 
the (27r)-periodic Dirac comb, see e.g. (HiS]- An immediate consequence of (JSj) is the following 
bound of fourth-order cumulants, for any set of (2-7r)-periodic functions A k , 1 < k < 4, 



Cum^y A fe (A)dZ(A), 1 < k < 



< c 4 ||Ai|| 2 ||A 4 || 2 N-4.2 1| 1 ||-4s II 1 , (10) 



where C4 is a positive constant and ||-Afc|| p = (J^ \A k (X)\ p dX) 1 ^ p . 



We now give a small series of examples, adapted from [23[ to the time varying setting. 

Example 1 (tvFBM(i?)). The Fractional Brownian motion (FBM) process {Bjj(k)}kez with 
Hurst index H G (0,1) is a discrete-time version of {BH(t),t G R}, a Gaussian process with 
mean zero and covariance 

E[B H (t)B H (s)] = \{\t\ 2H + \s\ 2H - \t - s\ 2H } . 
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The spectral density of {ABn(k)}kez is then given by A i->- |1 — e lA | 2 ^ +1 /fbm(A; H), where 
/fbm(A; H) 



2sin(A/2) 2H+1 



A 



+ |2sin(A/2)| 2H+1 ^|A + 2A:7rr 2 ^- 1 . (11) 



fc^o 



The time varying Fractional Brownian motion (tvFBM) has generalized spectral density (|6|) 
with p = 1, d{u) = H{u) + 1/2 G (1/2,3/2) and f*(u,X) = /fbm(A; H(u)), where H is a 
Lipschitz mapping of [0, 1] into a subset of (0, 1). Then (|7J) holds with /? = (2H(u) + 1) A 2. The 
corresponding non-negative local transfer function is 



A(u,X) = |l-e- iA | 1 / 2 ^) A //FBM(A; J F/(n)). (12) 
In this case, by Lemma 131 in iBl (T4J) holds for any D > sup u H(u) — 1/2. 

Example 2 (tvFGN(i^)). The time varying fractional Gaussian noise (tvFGN) is defined sim- 
ilarly as the tvFBM by f*(u, A) = /fbm(A; H{u)) but with p = and d(u) = H{u) - 1/2 G 
(-1/2,1/2). 

Example 3 (approximated causal tvFBM(i^)). The drawback of the tvFBM (and also of tvFGN) 
defined above is the non-causality of the transfer function A(u, •) defined in f)12[) . Since { A£># (£;)}fcez 
is purely non-deterministic, it admits a causal representation. On the other hand, to our knowl- 
edge, the corresponding transfer function is not explicitly given and thus dU is difficult to check. 
To circumvent this problem, one may construct an example by approximating a causal continu- 
ous time representation of the FBM, see e.g. (30l . Chapter 6]. Let us fix H in (1/2, 1). Replacing 
the integral by a discrete sum in this representation, one obtains the following process 

B H (t) = s);- 1/2 - (s)f~ l/2 } £s, tez, 

where {e s }sez is a standard Gaussian white noise. Then 

AB H (t) = Y,i kH ~ 1/2 ~ (k ~ 1)J" 1/2 } *t-k, * G Z > 

k>0 

is a causal representation of a stationary process. Using an integral approximation of the discrete 
Fourier transform of the sequence , one can show that, as A — > 0, the 

corresponding transfer function Aff(A) satisfies |.Ajj(A)| = Ch \\\ l l 2 ~ H + 0(1) for some positive 
constant Ch- Moreover, for any e > 0, there is a constant C such that, for all 1/2 < H' < H < 1 
and A G (— tt, 7r), 

|A H (A) - A H ,(A)| < C |# - |A|^-V2-e . 

Let now H be a Lipschitz mapping of [0, 1] into a subset of (1/2, 1) and define the approximated 
causal tvFBM(-ff) process by setting A(u, A) = -A#( U )(A). Then Condition (T2J) holds again for 
any D > sup„ iJ(tt) - 1/2. 

Example 4 (tvARFIMA(<7, d, r)). The iime varying autoregressive fractionally integrated moving 
average (tvARFIMA^, d, r)) process is defined by 



A(u; A) - (1 - e-«)-*)« ^ ll^lM^ , ( 13 ) 
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where d : [0,1] ->■ (-00, p + 1/2), <r : [0,1] -> R+, = [0i . . . <p q ] T : [0,1] -> R q and (9 = 
[01 ... 9 r ] T : [0, 1] —> W are Lipschitz functions such that 1 — X^sUi 4>k(u)z k does not vanish for 
all it E [0, 1] and z £ C such that |z| < 1. Using this latter condition, the local transfer function 
A{u] •) defines a causal autoregressive fractionally integrated moving average (ARFIMA(q, d(u) — 
p,r)) process and the local generalized spectral density (JSJ satisfies the conditions © and © 
with (3 = 2. Using Lemma [3] in [Bl the Lipschtiz assumptions on d, a, 9 and <p yield the 
condition HD with D > sup u d(u) — p. 

In order to verify Condition © trivially, the simplest definition of { A^A^t} in ah the previous 
examples is to take A® T (\) = A(t/T,X), that is to set the time-varying linear representation 

AFXt? = ^ A(t/T; A) e iA * dZ(X) , (14) 

as will be done for our simulated tvARFIMA in Section 5. However, one might also want to use 
a different transfer function A® T in ([1]), provided that Condition Q holds. Such approximated 
time varying linear representation is motivated by the tvAR(p) process, which satisfies the 
recursion 

p 

X t , T ~ <Pk(t/T)X t ^ T = a{t/T) e t , 1 < t < T , 
fc=i 

along with appropriate initial conditions. It has been shown in 0] that such non-stationary 
process does not satisfy a representation of the form (|14|) (with p = 0) but it does satisfy ([ID 
and © (with p = D = 0). 

3. Estimation method based on wavelet analysis 

3.1. Discrete wavelet transform (DWT). Following the approach presented in [23[ for the 

estimation of the memory parameter of a stationary sequence, we compute the discrete wavelet 
transform (DWT) of {Xt^T, 1 < t < T} (in discrete time) for a given scale function (j) and wavelet 
ip. We denote by {Wj^-T ;j>0,fcsZ} the wavelet coefficients of the process {X tt r, 1 < t < T}, 

T 

WW = £fy,tffc-t*t,r, k = 0,...,T j -l, (15) 
t=i 

where {/ij,t, ,t G Z} denotes the wavelet detail filter at scale j associated to <^> and V> through 
the relation 

/oo 
<j)(u + t)^(2- J u) dn , 
-00 

and Tj the number of available wavelet coefficients at scale j, which satisfies 

T2~ J — c < Tj < T2~ J , for some constant c independent of j > . (16) 

The filter hj r and Tj are defined so that the support {t : hj^k-t 0} i s included in {1, ... T} 
for k = 0, . . . , Tj — 1. Observe that here j denotes the scale index of the wavelet coefficient and 
k its position index. We use the convention that a large j corresponds to a coarse scale. Let us 
define 

Hj(\)=Y,hj,te- itX (17) 
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the corresponding filter transfer function. The following conditions on the wavelet ip and scale 
function <p are assumed to hold for a positive integer M and a real a. 

(W-l) (/) an d ip are compactly-supported, integrable, (j)(t) dt = 1 and tp 2 (t) dt = 1. 

(W-2) There exists a > 1 such that sup eeR (l+\£\) a < oo, where = f™^ ip(t) e'^dt 

denotes the Fourier transform of tp. 
(W-3) The function ip has M vanishing moments, t m ip(t) dt = for all m = 0, . . . , M — 1 
(W-4) The function X^fcez k m 4>(- ~~ k) is a polynomial of degree m for all m = 0, . . . , M — 1. 

Under (W-3) and (W-4) , the filter can be interpreted as the convolution of the A. M filter with 
a finite impulse response filter. Hence if M > p, Equation (|15p may be written as 

T 

Wj,k;T = hvk-t(* P X)t,T, k = 0, . . . , Tj - 1 , 
t=l 

where hj t . = hj . * A p . In particular, we have 

H^X) =J2ht*- itX = ^(A)(l - e*V . (18) 

3.2. Local wavelet spectrum, local scalogram, tangent scalogram, and final estima- 
tor. Recall that f(u, •) in ([5]) can be interpreted as a local generalized spectral density at rescaled 
time u € [0, 1]. Hence, as in the stationary setting used in [23|], for each such u, we may define a 
local wavelet spectrum o~ 2 {u) = {crj(it), j > 0}, where for each j > 0, o~j(u) is the variance of the 
wavelet coefficients at scale index j of a process with generalized spectral density f(u, •). This 
variance is well defined under the assumption M > p because in this case the wavelet coefficients 
at given scale are weakly stationary. Moreover, by ([5]) and (|18p . 

2 

dA . 



aj{u) = / \Hj(X)\ z f(u;X) dX = / Hj(X) A(u;X) 

The following intuitive definition will be also useful when developing our estimation theory using 
stationary approximations. For any u £ [0, 1] one may define a tangent stationary process for 
the p-th increment 

A p X t (u) = ^ A{u;X) e iAt dZ(X) , (19) 

J — TT 

whose spectral density is |1 — e _lA | 2p /(tt, A). Further we define the wavelet coefficients of the 
tangent weakly stationary process at any u £ [0, 1], namely, 

T 

WjM = E hvk-t{*?X)t{u) (20) 

£=1 

/TV 
Hj(X) A(u;X) e iA2Jfc dZ(X) , k = 0, . . . , Tj - 1 . (21) 
-7T 

these wavelet coefficients are indeed those of a process with generalized spectral density f(u,-). 
Thus the above definition gives 

a](u)=E[Wl k (u)] . (22) 
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An important tool for the estimation of the long memory is the scalogram (first introduced 
in this context by 32] and developed by (H) defined as 



k=0 



Here to cope with local stationarity, we will need a local scalogram for estimating the local 
wavelet spectrum, namely, for a given u £ [0, 1], 

5 !,t(«) = E nrtk)Wl k . T , (23) 

fc=0 

where {7,- t(&)} are some non- negative weights localized at indices k ~ uTj and normalized in 
such a way that 

T — 1 

E ^) = 1 • ( 24 ) 

fc=0 

The localization property will be expressed by imposing a bound on the increase rate of the 
following quantity (see equation ([33]) ) 

r q (u;j,T) = E |7j,t(A:)| |fc - r«2-J|« , (25) 

k=0 

as T — >• 00 for appropriate values of the exponent g. 

An important tool for studying the local scalogram is the tangent scalogram defined as 

°h(u) = E yj,T(k)Wh(u) . (26) 

k=0 

We note that this definition is similar to that of the local scalogram in (|23p but with the 
wavelet coefficients Wj^T replaced by the tangent wavelet coefficients Wj k {u) defined in ([20]) . 
The tangent scalogram is not an estimator since it cannot be computed from the observations 
X\ t T, ■ ■ ■ ,Xt,t- However, it provides useful approximations of the local scalogram. 

We conclude this section by deriving an estimator of the time-varying long memory parameter. 
The local wavelet spectrum is related to the local memory parameter d(u) by the asymptotic 
property cr|(n) ~ c 2 2rf ( M )-? as j — > 00. This property will be made more precise when we study 
the bias, see the relation (|4Up below. An estimator of d(u) is obtained by a linear regression 
of (logdj T (u))j = L r , ^l + £ with respect to j = L, . . . , L + £, where I is the number of scales used 
in the regression and L is the lowest scale index used in the regression. Let w be a vector 
w = [wo, . . . , Wi\ T satisfying 

I 1 
^2wi = and 2 log(2) E iw i = 1 • ( 27 ) 

i=0 i=0 
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The local estimator of d(u) is defined as 

L+l 

d T {L) = Wj-L log (Sf, T («)) • (28) 

3.3. Conditions on the weights 7j,t(&) and examples. Let us now precise our conditions 
on the weights 7j,t(&)- Denote, for any < i < j, v G {0, . . . , 2 l — 1} and A 6 K, 

^,r(A;*,u)= £ 7 i _ i ,r(2*i + t;)e aA , (29) 



where 

We moreover define 



7}(i,«) = {/ : < / < 2 _i (T i _ i - «)} . (30) 

Jj-r = sup |7i,r(*;)| • (31) 



The weights 7j,t(^) must satisfy an appropriate asymptotic behavior as T — > oo for obtaining 
a consistent estimator of d(u). More precisely, the following assumption will be required. 

Assumption 2. The index j depends on T so that the weights (jj s T{k))k satisfy the following 
asymptotic properties as T — > oo. 

(i) We have Sj t T — > 0, and for any fixed integer i, 8j+i,T ~ ^ % ^j,T- 

(ii) For all > 0, v G {0, . . . , 2* - 1} and v' G {0, ...,2 1 ' - 1}, there exists a constant 
V = V(i, v; i' , v') such that 

s J,t ^ $j,T(\;i,v)$ jtT {\;i>,v>)d\ ->• F(wV) . (32) 

J — 7T 

(iii) For all 77 > 0, i > and v G {0, . . . , 2* — 1}, we have 

—1/2 

£. T ' sup \3>j t T{\;hv)\ ->■ . 

»?<|A|<7r 

(iv) For g = 0,l, 2, we have 

r ? («;j,T) = 0((<5 i)T )-«) , (33) 
where T q (u;j,T) is defined in ([25]) . 



We provide several examples of weights satisfying this assumption below. In these examples, 
the weights 7j t(&), k = 0, . . . ,Tj, are entirely determined by Tj and a bandwidth parameter by 
and 

x b T Tj ~ b T T2" J ' . (34) 

In kernel estimation, one may interpret the bandwidth parameter by as the proportion of wavelet 
coefficients used for the estimation of the local scalogram a? T (u) at given scale j and position 
u, among the Tj wavelet coefficients available at scale j from T observations X\ t, . . . , Xt,t- 
Lemmas H] and show that, for these examples, Assumption [2] is satisfied as soon as Tj — > 00 
and byTj — > 0, except in the non-compactly supported case (K-EJ) in Lemma H] where we assume 
in addition that Tj exp(— c'hxTj) = 0(1) for any d > 0, which holds in the typical asymptotic 
setting b T X Tf C with £ G (0, 1). 
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Example 5 (Two-sided kernel weights). For u G (0, 1), one has a number of observations before 
rescaled time u and after rescaled time u both tending to infinity. Thus we may use a two-sided 
kernel to localize the memory parameter estimator around u. For a given bandwidth b = br, 
the corresponding weights are given by 

7i,r(*) = pf T K((uTj - fc)/(br2})) , (35) 

where K is a non-negative symmetric function and pj t T is a normalizing term so that (|24|) holds. 
In the last display we see that by is the bandwidth in a rescaled time sense while byTj is the 
corresponding bandwidth in the sense of location indices k = 0, 1, 2, . . . , Tj at scale j . LemmaH] 
in the appendix shows that Assumption [2] holds for a wide variety of choices for K. In particular 
Assumption [5] holds with SjT x (&T^j) _1 and V(i,v, ;i' ,v') = 2ir2~ t ~ % under the following 
assumption. 

Assumption 3. The weights (7j,t(&)) are defined by ([35]) with K = l[-i/2,i/2]- Moreover, as 
T — > oo, 6^ — > and Tj&y — > oo. 

Example 6 (Recursive weights). By recursive weights, we here mean that, given T, L and w, the 
possibility of computing d? T (u) by successive simple linear processing involving a finite number 
of operations after each new observations Xt-T as t grows from t = 1 to t = T. 

Because the DWT is defined as a succession of finite filtering and decimation, it is indeed 
possible to compute Wj^T online for all j € {L, . . . ,L + £} and k G {0, ... , Tj}. Then an online 
implementation of the local recursive scalogram can be done by setting 

3f_i ;T = 0, j & {L, ... ,L + f}, 

and, using the following recursive equation for all j G {L, . . . , L + £} and t G {0, . . . , Tj — 1}, 

a\ t . T = exp(-(b T T i )- 1 ) a] )t _ 1;T + Wf >t;T , 

where (b^Tj) -1 is the exponential forgetting exponent corresponding to the bandwidth parameter 
bf. For any u G (0, 1], we define a local recursive scalogram by 

^2 / \ _ -1^2 

a j;T\ u ) - Pj^iXuT^-X-pC ' 

where [a] denotes the integer part of a and 



fc=0 



Hence ([23D and (JMI hold with 

7 ^(fc) = P^-^^-^^lp^^Cfc) . (37) 

Observe that these weights are one-sided by construction, since only the observations before 
rescaled time u are used for estimating d(u). This is the reason why we may take u G (0, 1]. 
Lemma shows that Assumption [2] holds for these weights, provided that hr — > and Tjbx — > 
oo. 
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4. Asymptotic properties 

We study the asymptotic properties of dx{L) defined by (|28p as L, T — > oo in such a way that 
Assumption[2]holds for each j = L, L+l, . . . , L+l and for the chosen weights jj^{k). We provide 
further conditions on L, T, Slt under which consistency holds and derive the corresponding rate 
of convergence (Corollary [2]). Under strengthened conditions, we further show that dx(L) is 
asymptotically normal (Theorem [3j) . These results essentially follow from asymptotic results on 
the tangent scalogram (Theorem [21 Relations (|40p and (|6ip ) and approximation results on the 
local scalogram (Proposition [1]) based on the tangent scalogram. 



4.1. Asymptotic properties of the local scalogram. In order to derive asymptotic results 
for a 2 T (u), we first establish a bound on the error made when approximating a 2 T (u) by a 2 T (u). 



Proposition 1. Let u G [0,1] and consider a model satisfying Assumption^ Assume (W-l) 
(W-4)\ hold with M > p V (d(u) — 1/2) and a > 1/2 — d(u). Suppose moreover that Assump 
tion [ Mu>j) hold. Then, the following approximation holds. 



E 



( 5 2 t(u) _ al T {u)) 2 ] = O (2( 6+4 ^T- 4 ^ + 2 (s+iP+2d(u)) jT -2 § -2^ 



(38) 



Next, we derive a bound of the mean square error for estimating f*(u, 0)K(d(u)) 2 2 i d ^ using 
the estimator cr? T (u), where K is the function defined by 



K(d) 



|£r 2d |V(£)l"d£, l/2-a<d< M + l/2 



(39) 



In fact as the estimator dr{L) is defined in (j28|) using a? T (u) with j = L + i, i = C 
and as L,T — > oo, it will be convenient to normalize these quantities by 2 2LdlyU \ so that 
/*(u,0)K(d(u)) 2 2 J' d W /2 2Ld W = /*(u,0)K(d(u)) 2 2id ^ does not depend on L. 



Theorem 1. Let u G [0, 1] and consider a model satisfying Assumption^ Assume (W-l)- (W-4) 
hold with M > p V d(u) and a > (1 + /3)/2 — d(u). Then we have, as j — > oo, 



o-J(u) = f*( u , 0)K(d(«)) 2 2jd{u) {l + O (2-M\ } . 



Suppose moreover that Assumption^ holds and that 

2 (3+2{ P -d(u)})L T -2 6 -2 _^ q 



(40) 
(41) 



Then we have for j = L + i with i = 



j ■ ■ ■ j ^? 



E 



(2- 



-2Ld(n)-~:2 



(«)-r(«,0)K(d(u)) 2 2 ^)) 2 



O T + 2 ^ +2 ^~ d ^ L T~ 2 5j 2 T + 2 



-2/3L 



(42) 



Using the approximation result stated in Proposition [TJ we may also wish to obtain a central 
limit theorem (CLT) for the local scalogram. To this end, we must first derive a CLT for the 
tangent scalogram. Define, for any integer £ > and d G (1/2 — a, M] the 2^-dimensional cross 
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spectral density Dqo,i(A; d) = [D O o,^i;(rf)] 1 ;=0,...,2 £ -i °f the DWT of the generalized fractional 
Brownian motion (see [231 ]) by 



Doo,*(A; d) = I A + 2/vr|- M e^(A + 2ln) ip(X + 2lir)i}(2~ e {\ + 2Ztt)) , 
where for all £ £ R, 

e,(e) = 2-^[l,e-^,.. 
In other words D 00 ^(A; d) is a vector with entries 



-</2 M _-i2-<« _-i(2«-l)2-^ 1 T 



Doo^CAjd) = 2" £ / 2 ^|A + 2kr 2d e^" 2 (A+2M + 2/7r)V>(2^(A + 2Ztt)), < u < 2 e 
We can now state the CLT for the tangent scalogram. 



Theorem 2. Let u S [0, 1] and consider a model satisfying Assumption^ Suppose that (W-l) 



(W~4) hold with M > p V d(u), a > 1/2 — d(u). Suppose moreover one of the two following 



assertion holds. 

(a) Assumption^ holds and {et} is an i.i.d. sequence. 

(b) Assumptions^ {§j-{m|j hold and {et} is a Gaussian process. 

Then, for any £ > 0, the following weak convergence holds. 



(s L (u)-E[s L (u)]) ^A/-(0,(r(u,0)) 2 £(n)) , (43) 



where 



Sl(u) = 2- 2Ld ^5lf[al T {u) al +1 , T (u) . . . a 2 L+ ^ T {u)} T , (44) 
and £(ii) is the {I + 1) x (£-{- 1) symmetric matrix defined by 

2 i ~ i ' —1 

E M ,(u) = 2 2 { 1+4d W> i ^ K(0,0; T ID^.^^A;^))! 2 dA , (45) 

on i/ie bottom-left triangle < i' < i < £ with V(0, 0; i — i' , v) defined in \32t) . 

Remark 1 . A CLT for the sum of squares of the wavelet coefficients of a stationary long memory 
process was established in [22| for Gaussian processes and extended in [28[ for linear processes. 
We separate two sets of assumptions in Theorem [2j The result in the linear case is directly 
applicable under Assumption [3] in (a) since the weights are constant. To obtain a CLT for 
general weights (Assumption [2] in (b)) we use the additional Gaussian assumption. To avoid the 
Gaussian assumption for such general weights, one needs to revisit the results in [2(J to obtain a 
CLT for sums of weighted squares of decimated linear processes. Such an extension goes beyond 
the scope of this article. 

Applying Proposition Q] and Theorem [21 we immediately get the following result. 

Corollary 1. Let both the assumptions of Proposition [7] and Theorem @ hold. Let L be such 
that 

2 (3+2{p~d(u)})L T -2 § -3 T _^ q _ ( 4 g) 

Then, for any I > 0, the following weak convergence holds. 

(s L (u) - E [&(«)] ) AA(0, (/*(«, 0)) 2 £(u)) , (47) 
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where 

S L (u) = 2~ 2Ld ^5- L f[dl T (u) d 2 L+hT (u) . . . al +i , T {u)f . (48) 
and S(it) is the (£ + 1) x (£ + 1) symmetric matrix defined by J^5[ ), 

4.2. Asymptotic properties of the estimator d^(L). The following result establishes the 
consistency of the estimator dr{L) defined in ([28]) with w = [wq, . . . ,we] T fulfilling ([27]) and 
provides a rate of convergence. 

Corollary 2. Under the same assumptions as Theorem^ if moreover L — > oo, then we have 
d T (L) = d{u) + Op (S 1 ^ + 2 ( 3 / 2 +{p- rf (-)}) i T- 1 ,5- 1 T + 2-^) = d{u) + o p (l) . (49) 

Let us determine the optimal rate of convergence of dx{L) towards d(u). By balancing 
the three terms in the right-hand side of (09}, we find that for 2 L X 2-2/(3+6/3-2d(«)+2 P ) and 
b T x (T^t)" 1 x T( M (")- 2 P- 2 ' 3 - 1 )/( 3 + 6 ^- M ( ti )+ 2 P), these three terms are asymptotically of 
the same order. Hence for this choice of the lowest scale L and the bandwidth hx (recall that 
5l l T x b T T2 -L oo), we get 



d T (L) = d(u) + O p (Y-W(3+6/3+2{ P -d(„)}) 



We observe that the rate of convergence depends on the unknown parameter d(u). The de- 
pendence in d{u) comes from the approximation result (|38[) . which appears in ([49 P in the term 
2( 3/2+{ P -d(u)})L T -l 6 -l T _ Qther 

error terms in (|49|) have rates not depending on d(u), which is 
consistent with the facts that 1) the rate of convergence does not depend on d in the stationary 
case [23|, Theorem 2], and 2) these two terms come from the tangent weakly stationary approxi- 
mation. On the other hand, the term 2^/ 2 +{p~ d ( u W L T~ 1 5j\p may seem unusual for estimating 
the time-varying parameter for local-stationary processes. For instance, for a time-varying AR 
(tvAR) process with a Lipschitz-continuous parameter corresponding to Q with D = 0, the 
approximation error due to non-stationarity yields the error term b^ x (T5l,t)~ 1 ■ Indeed this 
corresponds to the term (nfi)~P with /3 = 1 in (20I . Theorem 2] which is shown to yield a rate 
optimal convergence in Theorem 4 of the same reference. Our error term is always larger as it in- 
cludes the additional multiplicative term 2 < ^/ 2+ ^ lp ~ d ^ u ^ L which tends to 00 since p—d(u) > — 1/2 
and L — > 00. Although we cannot assert that our rate is optimal, it can be explained as follows. 
In contrast to the tvAR process, our setting is locally semi-parametric, which implies to let L 
tend to 00 in order to capture the low frequency behavior driven by the memory parameter d. 
It is thus reassuring that if L were allowed to remain fix our error bound would be of the same 
order as for the locally parametric setting. The fact that letting L — > 00 decreases the rate of 
convergence is not surprising as the low frequency behavior implies large lags in the process, 
which naturally worsens the quality of the local stationary approximations. To conclude this 
discussion, it is interesting to note that the wavelet estimation of the memory parameter of a 
non-linear process may also yield a rate of convergence depending on the unknown parameter. 



It is indeed the case for the infinite-source Poisson process, see 14j, Remark 4.2] 



We now state the asymptotic normality of the estimator, which mainly follows by applying 



Proposition [TJ Theorem [21 the bound ([4*0]) and the (5-method as in [211 ]. 
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Theorem 3. Let the assumptions of Corollary [7] hold with a > (1 + /3)/2 — d(u). Let L be such 
that 

2 (3+2{p-d(u)})L T -2 s -^ ^ and 2-^-^0. (50) 
Then, the following weak convergence holds: 

5l^ 2 (d T {L) - d(u)) =>- AT(0,V(u)) , (51) 
where dr{L) is defined by A28\) and 

K 2 (d(«)) .^ 

i/ret/i S(u) and K.(d(u)) defined by (Jh°\ ) and {2P|), respectively. 

5. Numerical examples 

We used a Daubechies wavelet with M = 2 vanishing moments and Fourier decay a = 1.34 
(see [I3])- Hence our asymptotic results hold for -0.84 < (1 + /3)/2 - a < d(u) < M = 2 (the 
left bound —0.84 corresponds to choose /3 arbitrarily small). In particular d{u) will be allowed 
to take values beyond the unit root case (d(u) > 1). 

5.1. Simulated data. We simulate a T = 2 12 -long sample X 1<T , ■ ■ ■ , X T) t of a tvARFIMA(l,d,0) 
process which has a local spectral density given by (|13p with a = 1, (f>± = 0.8 and 

= (1 -cos(vrn/2))/3, u G [0, 1] . 

The obtained simulated data is represented in Figure [TJ We compute the local estimator a| T (n) 



Simulated ARFIMA process 

6 I 1 1 1 1 1 1 




-6< 1 1 1 1 1 1 1 1 1 1 

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 



FIGURE 1. A simulated tvARFLMA(l,d,0) of length T = 2 
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defined in (|23p with {7j,r(^)} given by the kernel weights on the one hand and the recursive 
weights on the other hand, for j = 1,2, ... ,5 with a bandwidth bx = 0.25. For the kernel weight 
we took the rectangle kernel K = l[_i/2,i/2]- The obtained local scalograms a 2 j T (u) of the local 
wavelet spectrum crj(u), j = 1, 2, . . . , 5, u € [0, 1] are represented in the lower parts of Figures [2] 
and respectively, with a y-axis in a logarithmic scale. The five corresponding curves exhibit 



Kernel Estimator of d using scales 1 2 3 to 3 4 5 resp. 




scales j=1 to j=3 
scales j=2 to j=4 
scales j=3 to j=5 
true value 



0.2 0.3 0.4 



Kernel Estimator of the wavelet spectrum for 5 scales. 



I I I 




I I I"- — — ■ 
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0.1 



0.2 



0.3 



0.4 



0.5 



0.6 



0.7 



0.8 



0.9 



Figure 2. Local estimates as functions of u £ [0, 1] for the simulated 
tvARFIMA(l,d,0) using a two-sided rectangular kernel. Top: &r{L\u) using 
scales j = 1, 2, 3 to 3, 4, 5 (respectively in blue, green and red) and the true value 
d(u) (in thin black). Bottom: a? T {u) for j = 1, . . . , 5. 



different variabilities, the larger j, the larger the variability, which is in accordance with our 
theoretical findings. On the top of these two figures, we represented the true parameter d{u), 
u € [0, 1] (plain black) and the corresponding estimators <It{u) for three sets of scales, namely 
j = 1,2,3 (blue line), j = 2,3,4 (green line) and j = 3,4,5 (red line), which correspond to 
L = 1, 2, 3, respectively, and t = 2 in the three sets of scales. The displayed bars centered at each 
estimate dx{u) represent 0.95 level confidence intervals, based on the asymptotic distribution 
given by (|51|) . Since the asymptotic variance depends continuously on d(u), we plug dr(u) in 
to compute each interval length. Numerical computations are done using the toolbox described 
in [l^j. One can observe the difference between the two-side kernel estimator and the recursive 
estimator. The former exhibits a uniform behavior along time with border effects close to 
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Recursive Estimator of d using scales 1 2 3 to 3 4 5 resp. 




Recursive Estimator of the wavelet spectrum for 5 scales 



10 



10' 



10" 



10" 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 



Figure 3. Same as Figure [2] using a recursive estimator. 



each boundaries of the interval [0, 1] (here we dropped the values of dr(u) for u < dt/2 and 
u > 1 — by/2 to avoid these border effects). In contrast the latter exhibits a diminishing then 
stabilizing variability along time. Thus it is better adapted for estimating the right part of the 
interval. It is interesting to note that the choice of L is crucial for this simulated example. 
This is due to the presence of an autoregressive component leading to a strong positive short- 
memory autocorrelation with a root close to the unit circle. As a result d(u) is over estimated 
if a too low frequency band of scales is used (as in the case L = 1), which explains why the 
true value mostly lies out of the corresponding confidence intervals. On the other hand this 
bias diminishes drastically as soon as L > 2, but, for L = 3, the confidence intervals are 
larger since the normalizing term Slt is larger. This larger variance is matched by the fact 
that the estimates are varying more widely for L = 3. We made similar experiments for a 
tvARFIMA(0,d,0) process. In this case, this bias is no longer observed for L = 1. We have also 
tried different values of the bandwidth hj- which also influences the bias and the variability of 
the estimates in the expected way. Finally we tested our procedures on longer series to check the 
numerical tractability. The computation of d? T (u) from X^t, • • • , Xt,Ti with T = 2 15 took less 
than 1 second for the kernel estimator and 7 seconds for the recursive estimator with a 3.00GHz 
CPU. We note that the recursive version is about ten times slower than the kernel estimator. 
On the other hand the recursive estimator is adapted to online computation, that is, <r| T (i) can 
be computed in a recursive fashion for each new available observation Xt t T- 
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5.2. Real data sets. We now use real data sets made of a sample of realized log volatility 
of the YEN versus USD exchange rate between June 1986 and September 2004. The realized 
log volatility is represented in Figure HI The series length is T = 4470, that is of the same 

Realized Log volatility YEN/USD exchange rate from June 1986 to Sept. 2004 



3 - 



1 - 




1988 1990 1992 1994 1996 1998 2000 2002 2004 

Figure 4. Realized log volatility of the YEN vs USD exchange rate from June 
1986 to September 2004. 

order as the previously simulated series (T = 2 12 = 4096). Viewing the simulated data as a 
benchmark, we used approximately the same bandwidth parameter by = 0.23 and the same sets 
of scales, namely L = 1,2,3 with I = 3 in the three cases. The two-sided kernel estimators of 
the memory parameter are represented in the upper part of Figure [5j As previously we also 
display the corresponding local scalograms in the lower part of the same figure. We omit the 
results for the recursive estimator for brevity. One can observe that here as L increases the 
estimates of d(u) globally increases which may indicate a negative bias at high frequencies. We 
only plot the confidence intervals for the first 10 estimates for clarity. Indeed, in contrast to 
the simulated case, they largely cover each other, which indicates a less important bias. The 
green curve appears as a good compromise as in the simulated example. It exhibits a 5 years 
periodic-like behavior, which seems to indicate that the long memory parameter is not constant 
over time. This seems to be in accordance with the findings of [3lJ who model long-memory 
realized volatilities by a change of the model parameters from one regime to another where the 
different regimes can be explained by the influence of changing market factors (such as the Asian 
financial crisis of 1998). 

6. Conclusion 

In this paper we have delivered a semi-parametric, hence fairly general, approach for estimat- 
ing the time-varying long-memory parameter d(u) of a locally stationary process (or stationary 
increment process). Apart from modelling the singularity at zero frequency by the curve d(u), 
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Kernel Estimator of the wavelet spectrum for 5 scales. 




1988 1990 1992 1994 1996 1998 2000 2002 2004 



Figure 5. Same as Figure [2] for the YEN vs USD exchange rate realized log volatility. 



we do not need to model the time varying spectrum of the remaining part explicitly Using a 
wavelet log-regression estimator, already shown to be well-performing in the stationary situa- 
tion, continues to work well due to a localization of the wavelet scalograms across time within 
each scale. 

The development of our approach is based on a weakly stationary approximation at each given 
time point u. As in the stationary case, due to the generality of our semi-parametric spectral 
density not to be depending on only a finite number of parameters (as in pj, e.g.), we need to 
concentrate our attention to well estimating around frequency zero (where the amount of the 
long-memory effect measured by d is visible). So a slightly subtle choice of considered scales 
for the log-regression has to be done: asymptotically we need that our estimator involves more 
and more frequencies (i.e. scales) but with a maximal frequency tending to zero. In the wavelet 
domain, this means that the lowest scale used in the estimator will be chosen so that i) the 
number of wavelet coefficients used in the estimator tends to infinity and ii) this lowest scale 
itself tends (slowly) to infinity. 

Simulations have shown that our estimator performs reasonably well beyond being attractive 
from the point of view of asymptotic theory. In our real data analysis example, we adopt the 
approach of 31f] and of 10(] to assume that realized volatilities of some exchange rates follow 
a long-memory model. We make the interesting observation that for the observed series the 
long-memory parameter can clearly not be considered to be constant over time - which suggests 
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that in explaining the persistent correlation in this exchange data there are certainly periods of 
stronger persistence followed by periods of weaker persistence. 
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Appendix A. Postponed proofs 

of Proposition^ By {23I . Proposition 3], there is a constant C\ such that, for all j > and all 
A G [-vr,vr], 

\Hj(X)\ < C 3 2 j ' 2 \2 j X\ M (1 + 2 j \\\)- a ~ M . 
Applying ([T]). ([15]) . (fT5 |) and ([H, we get, for any u G R, j > and k G {0, . . . , Tj - 1}, 

W jjk; T = W j!k (u) + R j>k (u;T) , 

where 



(52) 
(53) 



RjM u > T ) 



Eh 



A% k . s M)-A(u;X) 



dZ(X) 



The main approximation result consists in bounding 

S j (u;T)=J2lMk)Rlk^T) 

k=0 

and 

Tj-l 

D^u-T) = l hT {k)Wj,k{u)R h k{u;T) 

k=0 

In the following C denotes some multiplicative constant. Using ([3]), 
we have 



(jU), and (|65|) in Lemma [H 



A° 2Jk _ s AX)-A(u;X) 



3 iA(2Jfc-s) 



< C 2 jp \X\~ D \2?l 2 \2 j k/T -u\ + 2 3j '/ 2 /^} 



Recall that D denotes an exponent less than 1/2 which appears in the Conditions ([3]) and 
Using D < 1/2, we get 

E [Rj >k (u; T)] < C 2 2jp 2 3j T~ 2 {l + (k - Tu2~ j ) 2 } . 

Since we assumed a > 1/2 — d(u), we can take D large enough so that 1 — a — d{u) < D < 1/2 
(by adapting the constant c appearing in the afore mentioned conditions). Hence we can assume 
in the following that 

M>d(u)- 1/2 and d(u) + D + a > 1 . (54) 
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By (|2ip we also obtain that 

\E[W j>k (u)R j)k (u;T)}\ < C 2 jp \2?l 2 \2 j k/T - u\ + 2 3j '/ 2 /t} J Hj(\)A(u;X) \\\~ D dX 
Using (USD, (|5D, ©, /*(ii,A) < Cf*(u,0) (by ©), and ([52j) , we further have 



Hj(X)A(u; A) |A|- D dA<C7 / |fi>(A)| y/f(u,X) 1^"° dX 

J —IT 

< C a/T^KO) 2 ^W«)+^-i/2) j 

where we used that / R \£\M-d(u)-D ^ + |£|)- Q - A/ d(£) < oo by ((Ml)- The last displays provide 
simple bounds for the expectations of Sj and Dj. 

To bound their variance, we use [27l . Theorem 2, page 34], which yields 



Cov (Rl k (u;T),Rl k ,(u;T)) = 2Cov 2 (Rj^ k (u\ T),Rj^>(u; T)) 

+ Cum (Rj, k (u; T) , (it; T), R jik > (u; T), R j)k , (u; Tj) 

and 

Cov {W hk (u)R jM {u- T), W jtk ,{u)R jtk ,(u; T)) 

= Cov {W J:k (u),W jM (u)) Cov (R j>k (u;T),R jjkl (u;T)) 
+ Cov (i?,- k (u; T), W,- Cov (W,- k (u), R jt k ,(u; T)) 

+ Cum (W jtk (u) , (tt; T) , W j>k < (u) , («; T)) . 

Let us first provide bounds of E [Rj tk (u;T)Rj tk i(u;T)j and E [Ty jj fc(n)i2 : , j fc/(ti; T)] for k, k' = 
0, . . . , Tj — 1. Proceeding as previously, using (|54p . we get (in fact the cases above k = k' are 
particular cases) 

|E [i? i>fe (u;T)i? i)fe ,(«;r)] | < C 2 2 i p 2 3j T~ 2 {l + \k - Tu2~ j \} {l + |A/ - Tu2^'|} . 

and 

|E [W j>k (u)R jtk/ (u; T)]\<C 2 jp y/f*(u;0) 2 j ^ +D ~ 1 ^ \2?l 2 \2 j k'/T - u\ + 2 ij > 2 /T^ . 
Using ((TP]) , we further get 

| Cum (Rj t k(u; T),Rj >k (u; T),R jik >(u; T),R jik >(u; T)) | 

< C 2 4iP 2 6j T~ 4 {1 + (k - Tu2~ j ) 2 } {1 + (k! - Tu2~ j ) 2 } , 
and, denoting by Bj(u) the variance of the (weakly stationary) process {Wj tk (u), k € Z}, 

|Cum (W}, fc («), i^, fc (u; T), W,- fc /(u), ^(«; T)) | 

< C Bj(u) 2 2jp 2 3j T~ 2 {1 + \k - Tu2~ j \} {l + \k'~ Tu2~ j \} 

Gathering these bounds, we obtain the same bound for Var 1 / 2 (Sj(u;T)) and E[5,-(u;T)] and 
thus, using the definition of T in (|25p . 

|E [S 2 (u;T)] | 1/2 < C 2 2 > p 2^ T~ 2 {r (u;j,T)+T 2 (u;j,T)} . (55) 
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For Dj(u;T), we obtain 

|E [£>,-(«; !T)]| < C 2* vT^Ro) 2^ +D - 1 / 2 ) 2 3 ^ 2 T" 1 {r (u; j, T) + Ti(u; j, T)} . 

We then obtain that Var 1 / 2 (Dj{u;T)) is at most 

C- 2 iP 2^/2 T" 1 {r ( U ;i,T) +r 1 (u;i,T)} {sj /2 (u) + vT^M) y^+^-Va) } . 

Observe that by [231 . Theorem 1] we have, since M > d(u) — 1/2 and a > 1/2 — d(u), Bj(u) < 
C /*(«;0) 2 2d ("W. Hence, since D < 1/2, 

|E [D](u;T)] \ 1/2 < C 2™ 2^ 3 /2+<*M) T _1 {r (u; j,T) + r^; j,T)} . (56) 

By d23jl and (J53J), we have 

5j T («) = 3j T (u) + S,(u; T) + ^(u; T) , (57) 

where a 2 T (u) is defined in ([25]). The bound ([38]) now follows from ([55]) . ([35]) . ([57]) and Assump- 
tion El©. □ 

o/ Theorem^ By {22} and ([23}, 

E [5 2 T ( U )] = E [T^ 2 fc (n)] = a]{u) . (58) 

Since the wavelet coefficients (|20p are those of a weakly stationary process, their behavior at 
large scales (j — > oo) can be studied using (23l . Theorem 1]. By 23, Theorem 1], since we 
assumed (|7J) and M > d(u) — 1/2 and a > (1 + j3)/2 — d(u), we obtain ([35]) . In the following we 
denote 

= /*M)K(d(u)) . 

We now provide a bound for 

Var (5 2 T (u)) = ]T 7j , T (fe) 7i)T (A ; , )Cov {Wf^u), Wj^u)) =V 1 + V 2 , 
k,k'=0 

where the decomposition in V± + Vi follows from that of Cov (w 2 k (u), Wj k , (u)J in 

2Cov 2 {W jjk (u), Wj, w (uj) + Cum {W J:k (u),W 3:k (u),W 3:k ,(u),W j:k ,(u)) . 
We easily obtain that 

Vt = 2 r |$,- T (A; 0, 0)| 2 Df (A) dA , (59) 

J —n 

where Dj denotes the spectral density of the weakly stationary process {Wj >k ,k € Z}, 
is defined in ([29]) and, for any (27r)-periodic function g, g* 2 = g * g(\) = J 1 [ lv g{\ — 
Moreover, applying ([§]) with ([2T]) and ([29]) . we get that V2 can be expressed as 

4 

' [^(Afc^Afc)! ^,t(2 j (Ai + A 2 );0,0)^t(2 j '(A3 + A 4 ); 0, 0)£ 4 (A) d/i(A) . 
Hence, bounding k 4 , using @ and setting A = Ai + A2, we have 

IV2I < c 4 r \Hf (A)| 2 |$ i|T (2»A;0,0)| 2 dA , (60) 

J — 7T 
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where we set Nj(A) = Hj(X) A(u;X). Observe that ||^ 2 ||oo < ll^jlll = ll D jlli an d l|D* 2 ||oo < 
||D 3 -|||. Now by 0, Theorem 1] we have, since M > d(u) and a > 1/2 — <i(u), HDjUoo — 
O (2 2jd ( M )) (the constants depend on f*(u; •) only), which implies bounds of the same order for 
||Dj||l and ||Dj||2. Using (|59|) . ()60|) . Assumption [2{ii) with i = i' = v = v' = and observing 
that, by (27r) periodicity of |$j,r(A; 0, 0)|, 

r |^, T (2»'A;0,0)| 2 dA= T |^- T (A;0,0)| 2 dA , 

J —TV J —TV 

we finally get that 

Var (a 2 T (u)) = 0(2 ijd ^5 jt T ) . (61) 
Using d57D and (J5SJ), E (oJ r (u) - if* 2 2 J d ( tt )) 2 j is at most 

C {Var (3f >T (u)) + E [S 2 {u; T)] + E [Z>?(«; T)] } + O (^M"^') 

= O f 2 Ajd ^5- T + 2 (6+4p)j T" 4 (5" 4 + 2( 3+2 P +M ("))Jr" 2 j7 2 + 2 2( - 2d ^~^A 



where we used ([61 ]) . ([55]) . ([56]) and ([33]) . Using gl]), the last display gives (02]). □ 

o/ Theorem [|J Under the set of Assumptions (a) , the proof immediately follows from [28l . The- 
orem 2]. We now consider the set of Assum ptio ns (b). In this case, we rely on the Gaussian 
assumption. The proof follows the lines of [2ll . Theorem 2], in which the stationary case is 
considered, i.e. 7j,t(^) = 1- We first observe that, for any fj, = [/xq ••• He] T £ we may 

write 

^ T s L {u) = #A L & , 

where £l is a Gaussian vector with entries (W^L+i,^)),,^^ o<fc<r i+ anc ^ ^ L ls * ne diagonal 
matrix with diagonal entries ( 2~ 2Ld ^ 5 fii^L+i,T{u) ) • We may thus apply [22L 

V ' ' / 0<i<£, 0<fc<Ti^i 

Lemma 12]. 

To obtain (1430. it is thus sufficient to show that 



p(A L )p(Cov(&)) ->0, (62) 
where p(^4) denotes the spectral radius of A, and 

Cov(a* t 5l(«)) ->• (/*(«, 0)) 2 ^ T S/x . (63) 
We have, by ([3T]) and Assumption , 



p(A L ) < 2- 2Ld ("),5 i : 1 T /2 max | W | max <5 L+iiT = o (2~ 2id ( M 



0<i<£ " " 0<i<l"' 

Using [2ll . Lemma 6], [22I Lemma 11] and that ~Di+i is the spectral density of the process 
{WL+i,k( u )i k £ Z}, we have 

p(Cov(^))<^p(Cov([W L+i)fc («), fc = 0,...,r i+i ])) < 27r^||D i+i || 00 . 

i=0 i=0 

By [H, Theorem 1], since we assumed M > d(u) and a > 1/2 — d(n), we have HDi+jHoo = 
0(2 2Ld ^). This with the last two displays implies 
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We now compute the asymptotic covariance matrix of Sl(u). Let < j' < j. Using (|30p and 
the Gaussian assumption, we have 



Cov (al T (u),a^ T (u)) = E E iMk^^Cov (W? k (u), W} >k ,(u)) 

k=0 k'=0 

= 2 E E E l3,T{k)lrA^~ 3 ' '+v)Cov 2 (w j>k (u),W jll2j . jl+v (u) 
v=o k=o ieTj(j-j',v) 

Using [23L Corollary 1], we have 

D j,i-j>(A) eiA(fc "° dA , 



Cov 



Wj,k{ u )> W j',l23-i'+v( u ) 



where Dj t j_ji = [^j,j-j',v] v= o yj-i' -1 denotes the 2- ?_:, '-dimensional cross-spectral density be 
tween Wj t k(u) and [Wj, l2 j-j' +v {u)] v=Q 2 i-i'-i- It follows from the last two displays and ([291 
that 



2J-J -1 



v=0 



Cov (al T (u),a], :T (u)) = 2 ^ / % iT (A;0,0)$,- r (A;i -f,v) B jd _ fiV (X) dA , 



where 



D 



3,3-3 ,v 



(A) 



D 



3,3-3 



u (OD jJ _, > (e-A)d^. 



By [23j, Theorem 1(b)], since we assumed M > d(u) and a > 1/2 — d(u), using Q, we have, for 
j = L + i and j' = L + i' with £' < i fixed, 
i|2-M(tt)j- D 



'3,3-3' - /*( w 5 °) D oo,i-i'(-;rfN) 

The last three displays, ([45 1) . Lemma [2] and Assumption [2] yield 

Cov(s L , T (u)) -+(/>,0)) 2 £, 

and hence (1631). 



. 



□ 



of Theorem^ We first show that 



£-1/2 
°L,T 


f 

2~2Ld(u) 




- K* 


1 

22d(u) 




{ 


. ^L+£,t( u ) . 




22M(n) 



AA(0,(r(n,0)) 2 S( W )) . (64) 



Observe that the weak convergence (|64p is the same as (j47p except for the centering term. Re- 
lation (j47p is valid since the assumptions of Corollary [1] hold. Applying 5l,t 0, Proposition Q] 
and the left-hand side condition of (j50p , we have that, for any j = L + i with a fixed i = 0, . . . , £, 

6 -^ 2 -2Ld(u) E [^(u)} = 5^ 2 2- 2Ld ^E [al T (u)] + o(l) . 
The bias control ()40p and the right-hand side condition of ()50p then imply 

§ -l/2 2 - 2Ld(u)E ^ ?t(u) ] = 6 -^ r{U}0)K{d{u))2 ^) + o(1) . 
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This, with (|47[) gives the weak convergence (|64p . 

The convergence (|5ip now follows from (|64p by applying the 5- method as in [2l|, Proposition 3] . 
Indeed, define 



g(x) = log(xj) for all x = [x ... x e ] T . 

i=0 

Observe that, by (f27|) and ([28]) . we have 

g(2~ 2Ld ^[al T (u) a 2 L+hT (u) ... a 2 L+ ^ T (u)] T ) = d T (L) 

and 

g ff*(u, 0)K(d(u))[l ... 2 2M (")] T ) = d(u) . 

Thus ()5ip follows from (|64[) by computing the gradient of (7 at the centering term, 

K Wl 2~ 2d ^ . . . Wl 2- 2ld(u) ] T 



Vg(/*(u,0)K(d(«))[l 2 2d ^ ... 2 2M ^] T 



/*(«,0)K(d(u)) 



□ 



Appendix B. Technical lemmas 



Lemma 1. Assume (W-l)- (W-J^), Let hj . the wavelet detail filter at scale index j and hj^. any 
factorization of it by A p with p G {0, . . . , M}. Then we have 



\h'\ ^ C 2 j(p+1/2) and 5^(1 + |s|) < C 2 j 



(p+3/2) 



(65) 



s6Z 



Lemma 2. Suppose Assumption\^holds. Let i,i' >0,v G {0, . . . , 2* — 1} and 1/ G {0, . . . , 2 1 ' — 1}. 

Define, for any [2^) -periodic function g, 



It( 9 ) = S~ T 



$j, T (A; i, w)$j, T (A; «') 5(A) dA 



Then the two following assertions hold, 
(i) Ifh->g in L°°([-7r, vr]), then sup \I T (h) - It (g) | -> 0. 

T>0 

(m,) // 5 G 7T, 7r]) is continuous at zero, then, as T — >■ 00, Irig) ~^ V(i, v; i' , v) g(0). 

Proof. By linearity of in we may take 5 = to prove Assertion (JTJ> . We have, by the Cauchy- 
Schwarz inequality 



|lTW|<||^||oofe /2 ||$ i)T (-;i^)||2 



'^/>i;r(-;i'V)ll2 



Using Assumption [2]^ii) , the terms between brackets are bounded independently of j and we 
obtain (jl]). 

We now prove (|n|). By linearity of It, we may assume g(0) = 1. By Assumption [2]^ii) , we 
have It(X) ~ * V(i,v;i' ,v'). On the other hand, we have, for any r/ > 

\i T { 9 ) - i T (i)\ = \i T ((g - + (g - i)i[- w] c)| 



< 



\l T ((g - 1) ![-„,„]) | + |/r((5 - 
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Observe that by continuity of g at the origin, \\{g — 1)1 ||oo — > as r/ — > oo. By @[]), we get 
|-Tr((s ~~ 1)-^ [-??,»?]) - as — )■ oo. It thus only remains to show that Ut((<? — l)l[- 7?i ^ic)| — > 
for any n > 0. This follows from the bound 



|/ r ((5-l)l[_^]c)| < ||<?-1| 



—1/2 

5 jT ' sup \®j tT (\;i,v) 

r?<|A|<7r 



8^ 2 sup \®j,T(\;i',v' 



r?<|A|<7r 

and by applying Assumption [2jiii). □ 
Lemma 3. For any a > and b > 0, £/iere exists c > suc/i i/iai 

- 1| < c{\ + log(|z|)} a for all a€[0,a], z <E C mi/t |z| < 6 . 
Lemma 4. Assume one of the following. 

(K-l) K= l[-l/2,l/2] • 

(K-2) K is compactly supported and \K(£)\ = o(|£| -3 / 2 ) as |£| — >■ oo, where K denotes the 

Fourier transform of K. 
(K-3) K is integrable, K has an exponential decay, i.e. for some c > 0, \K(£)\ = O (exp(— c|£|)) 

as |£| — >• oo ; K(t) = O(\t\~ po ) as \t\ — > oo for some po > 3, the derivative K' of K 

satisfies \K'(t)\ = 0(\t\~ Pl ) as \t\ — > oo for some p\ > 1 and Tj exp(— ^byl}) = O(l) /or 

any c' > 0. 

Suppose that b^ — > and i/iai j depends on T so that Tjbx — > oo as T — > oo. Then, for weights 
given by [35\) . Assumption^ is satisfied with 

^-^wr 1 (66) 

\\K\\ 2 

V(i,v;i',v') = 2ir J' 2 2~ i ~» / , > 0, < v < 2*, < «' < 2*' . (67) 

Proof. For convenience, we will omit the subscripts t and jj> in this proof section when no 
ambiguity arises. Under (K-Q}, one has p = blj + 0(l). Under (K-[2]), K is uniformly continuous 
on its compact support S and, since u £ (0, 1), b — > and Tjb — > oo, S eventually falls between 
the extremal points of {(uTj — k)/(hTj), k = 0, . . . ,Tj — 1}. Thus, 

Tj-l 

(bT,)- 1 K ^ uT i ~ *0/( bT i)) -+ [ K ( s ) ds = W K h ■ 

Under (K-{3]), using that \K'(t)\ < c(l + for some pi > 1 and c > 0, we get 

(bT,)- 1 £ K((uTj - fcJ/CbTj)) - [ U/b K(s) ds = 0\ (bT,)" 2 £(1 + l/(hT 3 )y 

k=o J(u-i)/b y l=0 

= O ((bT,)- 1 ) . 

Hence the last three displays yield that, in all cases, 

p j: T ~ ||.fiT||i(bTTj) . 

The asymptotic equivalence (|66|) then follows from the definitions (|35p and (|31|) . and we obtain 
Assumption HQ) by ([TBJ. 
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Let us now prove that Assumption l2lfn]) holds under (K-[T]) , (K-[2]) and (K-[3|) , successively. 
Note that, by definition of (|29p . we have 

* i|T (A;i,t;)$ il r(A;»',«') dA = 2rr 7j _ 4iT (2 i / + V ) 7i _^ T (2 i '/ + </) . 

_7r leTj(i,v)nTj(i',v') 

Under (K-EJ), using 2- i T i _; ~ 2 -i 'T i _ i / ~ 2) by flE]), ttZ) -> oo and b -> 0, we easily get that 
the supports of the sequences {7 J _j i T(2 l / + f), Z > 0} and {7 :) _j' j r(2 l Z + t>)> Z > 0} are eventually 
included in Tj(i,v) n Tj(i',v') and their intersection is of length asymptotically equivalent to 
hTj. Hence, using (JoUj) . and ([55]) with ||-fiT||i = ||-K"||oo = 1, we obtain that, in this case, 

pw 



/7T 
$ jt T(X;i,v)$ jjT (\;i',v')d\ ~ 2vr 
-7T 



Tj-iTj-i> 



By (|16p . this is Assumption [2]Jii]) with V(z, u') = 27r2 1 *' which coincides with (|67p under 

(K-nj). 

Under (K-[2]) , we proceed by interpreting the sum in (|69p as a Riemann approximation of J K 2 
up to a normalization factor. For I E Tj(i,v) D7j(i' ,v'), we approximate 

J; = (bT i )~ 1 pj-i,TyOi-i',T7j-i,T(2 J Z + v)-fj-i^ T (2 1 ' I + v') 

= (bT,)" 1 Jf ({uTj-.i - (2H + v)}/{bTj^}) KduTj.e - (fl + v')}/^^}) , 
by the local average 

Ji= f A 2 (s)ds, 
J it 

where Ii is defined as the interval [{uTj — (I + l)}/{bTj},{uTj — I}/ {hTj}]. Observe that 

sup \s - {uTj-i - (2'7 + V )}/{bT i _ i }| < -L + - 

Using (Q2D, i, « = O(l) and I = 0(Tj), we obtain, for any fixed integers i and v, 

sup sup Is - {uTj-i - {2H + v)}/{bT^i}\ = O^hTj)' 1 ) , (70) 

0<l<2Tj sell 

and the same holds if i, v is replaced by i', v ' . Note that 

Tj(i,v) n Tj(i',v') = {0, 1 . . . , {2- i (T,_ i - v)} A {2"*' - u)} - 1} , 

which, by (|16p and the fact that K is compactly supported, is eventually contained in {0, 1, . . . , 2Tj} 
and eventually contains the set of Vs such that J\ ^ 0, which is of size 0(bTj). By (|70p . we also 
see that, out of a set of length O(bTj), both J[ and J; vanish. Hence we have 



I - uTj\ + m ' J » ^77^ 



O ( bT, sup | J; — J; 



J675(i,w)n75(i',u') 

Using ()70p and the uniform continuity of A, there exists a constant c such that 

sup | J, - Ji\ < {hTj)- 1 sup |A 2 (s) - K(t)K(t')\ = o((bT j )~ 1 ) . 

I \s-t\,\s-t'\<c/(bTj) 
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The last two displays, ([69]) and the definitions of J\ and Ji thus yield 

Pj-i,m-i',T j ^. T ( A . ^ w )$ j)T (A; i',v') dA ~ 2^ y i^ 2 (s) ds = 27r||i^||| . 



(71) 



By ([66]) and ([68]) . this gives Assumption [2]([TTJ) with V(i, v; i', v') given by ([67]) . 

Under (K-[3]), we proceed similarly but we can no longer use that K has a compact support. 
Instead we use that K is bounded and |A'(i)| < c'(3 + |t|) _Pl for some pi > 1 and d > and 
thus, for any c > 0, as soon as (c + l)/(bTj) < 1, 

|A 2 (s) - < c" (br j )" 1 (2 + \uTj - l\/(bTj)y p . 

With ([7Q]) and since the length of 7^(«,u) n Tj(i',v') is 0(7)), we get 



sup sup 

sell \t-s\,\t'-s\<c/(bTj) 



T, 



E 

Je75(t,«)n75(i',«') 



O (bT,)" 2 £(1 + k/ihTj))-? = O ((hi))" 1 ) 



fc=0 



Moreover 



E 

ie75(i,^)n75(i',«') 



-u'/b 



A 2 (s)ds -»• , 



where u' = [{2 -< (Tj_i - v)} A {2^' (!)„;, - ?/)}]/!) - u -> 1 - u by JIB]). This yields dHJ as in 
the previous case and thus the same conclusion holds. 

Let us now show that Assumption [2] ([m]) holds under (K-[T]), (K-[2]) and (K-[3]), successively. 
Under (K-HJ, we have 



\${\;i,v)\ = Pj \ 



T 



N 

E' 

fe=i 



where A = Nj t T denotes the number of I £ Tj(i,v) such that jj—ipffil + t> ) > 0. Since the 
Dirichlet kernel satisfies 

N 

]Te ifcA 
fc=i 



|Av(A)| 



sin(AA/2) 



sin(A/2) 



we observe that, for any rj > 0, sup A r >1 sup Ag ^ 2 7r-r?] l-^Jv(A)| < oo. Hence, with ([66]) and ([68 
we obtain Assumption [2] ([m]) . 

Under (KJ2J and (K-EJ, using that #(t) = (27T)- 1 / A(£)e^d£, we get 



/oo 
jtivTjt-vyQoTj-i) e i'(A+2 ! «/(bT :) _. 1 )) ^ 

-oo . ~ ,. , 



leTj(i,v) 



where 7j(i, u) denotes the set of all I € 7}(i, u) such that 7 3 -_j r(2*Z + u) does not vanish. Denote 
the length of Tj(i,v) by N = Nj^ as in the previous case. We thus obtain 



|*(A;*»| < (2vr^ 



j-i,T) 



A(£) I^DjvCA + 2*e/(br J -_ i ))| 
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Let r) > 0. Splitting the above integral as = / 2 *| f |/ C br i _ 1 )< f? / 3 + / 2 i|€|/(bT j _ j )>ij/2' we obtain 
sup |$(A;i,«)| <(27r / o J _ i . r ) _1 sup |Av(A)| 

AG[r;,7r] |A|e[r?/2,7r+r?/2] 

+ (27r /0j _ i , T )- 1 || J D iV || oo / K(0 

J2^\/(bT j _ i )> V /2 

Now, we have, for 77 small enough, sup 7V>1 sup| A | 6 ^/ 2 ,7r+??/2] l-^iv|(A) < 00, ||-Djv||oo < A" an d, un- 
der (K-|2J), A = O(bl)) and / 2 i|?|/(bT 3 -_ i )>^/2 ^(0 d £ = (( bT i)~ 1/2 )> which, with the previous 
display, ([66]) and ([68]) , implies Assumption [2] (jm]) . Under (K-[3]) , the same conclusion holds using 
that A = 0{Tj), I m/{hTj _ i)>v/2 K{0 d£ = 0(exp(-c2- i - 1 7?bT i )) and Tj exp(-c / bT J 



d£ 



0(1) with c' = c2~ i - 1 ?? . 

Finally we show that Assumption [2] (JTvJ) holds under (K-[T]) , (K-[2]) and (K-[3]) , successively. 
Using the definition ([25]) and ([T6]) . we get, for some positive constant C, 

T — 1 

r,(u;i,r) < V ^((uT,- - fc)/(br j )) + o(r (u;i,r)) , 

where K q {x) = K(x)\x\ q . By definition of pj t x, one has To(u;j,T) = 0(1). Under (K-[T]) and 
(K-[2]), K q is bounded and compactly supported, so that ^ fc K g ((uTj — k)/(bTj)) = 0(bTj). 
This, with (j68[) and the previous display, implies (j33j) for all q > 0. Hence, to conclude the 
proof, it only remains to show that, for q = 1, 2, under (K-[3]), 

TV— l 

^^((«r i -fe)/(br,)) = o(br i ). 



fc=0 



,T,-1 



Using that A(x) = 0(\x\ po ) as a; — > ±00, and g < 2, we separate the sum Y2k=o 1 
E|«TV-fc|<bTV for which K g(( uT j ~ k )/( hT j)) is 0(1) and E|„T j -fc|>bT ) for which K g(( uT j 
k)/(bTj)) is 0(\(uTj - k)/(bTj)\ 2 ~ Pa ). Hence, we get 

Tj-l / 

^)jir 9 ((iir i -fc)/(br i )) = 00^0 + brf" 2 £ z 2 -«> 

fc=0 \ H>bTj-l 

Observing that bTj — > 00 and po > 3, we obtain the desired bound. □ 

Lemma 5. Suppose thatbx — > andTjbx — > 00. Then, for weights given by \31\j , Assumption^ 



is satisfied with 

S j>T ~ (brT,)- 1 (72) 

V(i,v;i',v') = it 2~ l ~ i \ i,i' > 0, u € {0,...,2*- 1}, 1/ G {0, 1} . (73) 

Proof. For convenience, we will omit the subscripts t and ; t in this proof when no ambiguity 
arises. We set Uj = [uTj] in the following. Using ([36]) . bTj — > oo, b — > and Uj ~ uTj, we get 
that 

p~(br i ). (74) 

Observing that (5 = 7(%) = p -1 , we get ([72]) and Assumption [2]ji]) follows. 
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Let us now show that Assumption [2tjn]) holds. Using (|69p . we find that 

/ * ijT (A; t, u)$j,r(A; t', t/) dA = exp — ^— ^— 

J -tv Pj-i,TPj-i',T V bJ j-i / 

JV-1 

E 



JV-l 

c e' {2I/(bT ^- !) 



where N = {2- i (u j - i -v)} A{2~ i ' (uj^i -«')}■ Using (USD, flU]), bTj -> oo, b -> and ~ uTj-, 
we obtain Pj _ i)T ~ 2 i (bT i ), + u + l)/(b7)_i) ~ u/b, 2*/(bT i _ i ) ~ l/(bT,-), NV/QbTj-i) ~ 
u/b and similar result with replacing Using these asymptotic equivalences and the 
previous display, we obtain 

2vr A-o(l) 



where 



* i/r (A; i, t,)* ilT (A; i', ✓) dA ~ , (75) 

Uj-i + w + l Uj-iz + u' + l _ f 2 i 2' 



A = exp — J J + N < h 

y bTj-i bTj_j/ | bT,_j bTj^t J J 

Using ^TS]), we have N = uTj + 0(l) aaduj-i+u + l = 7x7)2^ + 0(1). Thus JV2*- (i^-i + u + l) = 
O(l) and the same holds with i', v' replacing i, v. This implies that A = exp {O ((bTj) -1 )) — >■ 1. 
This, dZSJ) and ([73]) yield Assumption [2]Jn]) with U(i, v; i', v') defined by ([751) . 

We finally show that Assumption [2 t[mj) holds. By setting N = 2~ i (iij_j + v) and k = N —l — l 
in (1291). we obtain 



|$(A;i»| =p" 



N-l 

k{i\+2 i /(bTj- i )} 



E 



h _ e -iA-2 l /(bTj_i)| 



c 

Using that N2 l /(bTj_i) ~ b -1 ->■ oo, b~ x l 2 p~ x ->■ and that, for any 77 > 0, |1 - z| does not 
vanish on the compact set of complex numbers z = re 1 ® such that r S [0, 1] and i] < \6\ < -k and 
thus is lower bounded on this set, we obtain Assumption [2" l[m]) . 

Finally we show that Assumption [2] (fiv|) holds. By (|16p . we have, for any q > 0, 

Uj— 1 

r g (n; j,T) = p" 1 £ e -C«j-i-*)/CMi) _ 1 _ + (r (n; j,T)) . 

fc=0 

Observe that ITo («;,?', T) = 1. Setting I = Uj — 1 — k, and separating the above sum over 
/ < [<?bTj] + 1 for which we bound the exponential by 1 and I > [<?bTj] + 2 so that e~ x /( hT j '>x q is 
decreasing on x > I — 1, we get 

J2 e -(u j -i-k)/(bT j ) lu ._ 1 _ kl<! < ST li+ Y, e- l/{hTj) l q 

k=0 



< O {(bTj)^ 1 ) + [ e - x ^ hT ^x q dx 

Jx>[qbTj]+l 



= O {(bT^ 1 ) . 

The last two displays, (|74p and (j72[) yield ()33p . which achieves the proof. □ 
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